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Abstract. In this paper, we explore a connection between binary hierarchical 
models, their marginal polytopes and codeword polytopes, the convex hulls of 
linear codes. The class of linear codes that are realizable by hierarchical models 
is determined. We classify all full dimensional polytopes with the property that 
their vertices form a linear code and give an algorithm that determines them. 



1. Introduction 

In theoretical statistics the marginal polytope plays an important role. It is the 
polytope of possible values that a sufficient statistics can take. It encodes in its face 
lattice the combinatorial structure of the boundary of the exponential family defined 
by the statistics. For a model on discrete random variables it can be represented 
with vertices that have only components or 1, commonly called a 0/1 polytope. 

In coding theory when decoding binary linear codes one can apply techniques 
from linear programming and optimize a linear function over the convex hull of the 
code words, known as the codeword polytope [FWK03 . 

Observing that for certain choices of sufficient statistics on binary random 
variables these two notions coincide, our main contribution is a characterization of 
the corresponding polytopes. We do not address problems that are directly linked 
to coding theory. However, we do hope that our result will contribute to a better 
understanding of the closure of exponential families, which is an important problem 
in statistics. 

The paper is organized as follows: In Section|2] we introduce the necessary notions 
to define hierarchical models and fix the notation. We review different descriptions 
of so called interaction spaces in Section [3] In Section |4j we establish the link to 
coding theory. Finally, in Section [3] we give our main result, the classification of 
all such full dimensional polytopes whose vertices form a linear code and give a 
recursive formula for their number. 

2. Preliminaries 

2.1. Exponential Families of Hierarchical Models. Given a non-empty finite 
set X, we denote the set of probability distributions on X by V(X). The support of 
p £ V{X) is defined as supp(p) :— {x G X : p(x) > 0}. The set of distributions with 
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full support is denoted as V(X). The set V(X) has the geometrical structure of a 
(| X | — l)-dimensional simplex lying in an affine hyperplane of 



{f:X^R} : 



the vector space of real valued functions on X. Statistical models, such as hierarchical 
models are subsets of V(X). In this paper, we will only consider so called exponential 
families which arc smooth manifolds. 



Definition 1. The map 



exp : R* -► V(X), /i 



c 



/ 



is called the exponential map. It acts component wise by exponentiating and 
normalizing. Then, an exponential family (in V(X)) is defined as the image exp(I) 
of a linear subspace X of M. x . 

An exponential family £ naturally has full support and is therefore contained 
in the open simplex V(X). However, to get probability distributions with reduced 
support one has to pass to the closure £ with respect to the standard topology of 
R x . 

Now we consider a compositional structure of X induced by the set [N] : = 
{1, . . . , N}. Given a subset A C [JV], we define 

X A := {0, 1} A , 

and the natural projection 

Xa ■ X[ N ] — > Xa, ( x i)ie[N] ^ ( x i)ieA ■ 

In the following, we will abbreviate X := X\m- One can view V(X) as the set of joint 
probability distributions of the binary random variables {Xi : i S [N]}. We now use 
the compositional structure of X in order to define exponential families in V(X) 
given by interaction spaces. Now, decompose x G X in the form x = (xa, £[tv]\a) 
with xa € Xa, £[at]\a <G Ximxai an d define Xa to be the subspace of functions that 
do not depend on the configurations £[at]\a : 



I* ■ f(xA,X[ N ]\ A ) = f(x A ,x' [N] \ A ) 

for all x A € Xa, and all X[ N ]\ A , x[ N ]\ A € ^[at]\a} 



In the following, we apply these interaction spaces as building blocks for more 
general interaction spaces and associated exponential families . The definition 

of a hierarchical model is based on the notion of a hypergraph |Lau96] : 

Definition 2. A pre-hypergraph A is a non-empty subset of 2^ \ {0} that contains 
all atoms {i} for i £ [N]. 

A hypergraph is a pre-hypergraph that is (inclusion) complete in the following 
sense: If A e A and ^ B C A it follows that B e A. 

Remark. For technical convenience, we have defined hypergraphs to be complete. 
In this way, it is easy to define a hierarchical model for each hypergraph. However, 
the notion of a pre-hypergraph turns out to be more natural in the context of the 
polytopes and linear codes that we consider below. 
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Given a hypergraph, we define the associated interaction space by 

I A := I a- 

Note that, since a function that depends only on its arguments in A, only depends 
on its arguments in B D A, it suffices to consider the inclusion maximal elements in 
A. We denote them by A m and have 

I A = 22 T--A- 

AeA m 

We consider the corresponding exponential family: 

Definition 3. The hierarchical model assigned to the hypergraph A is the expo- 
nential family 

£ A := exp(X^). 
We give two examples for hypergraphs: 
Example 4. 

(1) Graphical models: Let G = (V,E) be an undirected graph, and define 

Ag {M C C y : C is a clique with respect to G} . 

Here, a clique is a set C that satisfies the following property: 

a,beC, a^b => {a,b} e E . 

The exponential family £a g is characterized by Markov properties with respect to 
G (see |Lau96j ). 

(2) Interaction order: The hypergraph associated with a given interaction order 
k € {1,2, ... , N} is defined as 

A k , N := {0 + A C [N] : \A\ < k} . 

If appropriate, we will sometimes drop the N and write Ak- We have defined a 
corresponding hierarchy of exponential families studied in [AmaOl, AK06 : 

The elements of this hierarchy have nice interpretations. It can be seen that the 
closure of the family £a 1 contains exactly all probability distributions that factor. 
This means that 

i£[N] 

where PiixA are the marginal distributions of p. Generally, an clement p G £k will 
allow a factorization as 

P= ]J <I>a(xa), 

AC[N],\A\=k 

where <pA depends only on xa- However, the <f> are not necessarily probability 
distributions and not unique. Note that pe4 \ £k, k > 2, does not necessarily 
admit such a product structure. 

We will clarify these definitions in the following simple 
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Example 5. Consider the case N = 2. The configuration space is given as 

X:= {0,1} 2 = {(0,0),(0,1),(1,0),(1,1)}. 

The vector space of real valued functions WL X is 4-dimensional and the probability 
measures form a 3-dimensional tetrahedron. Considering the hypergraphs of fixed 
interaction order and their exponential families, one has only two examples here: 
£a x 2 an d £a 2 2 = 'P(X), only the first being nontrivial. Figure [lj shows the situation. 




Figure 1. The exponential family £a x 2 in the simplex of proba- 
bility distributions. 

The exponential family £12 is a two-dimensional manifold lying inside the simplex. 
One should already think about this as a square (the two dimensional cube) molded 
into the simplex. 

In the following, we will study the interaction spaces more thoroughly by com- 
paring different generating systems. 

3. Generating Systems of Interaction Spaces 

In this section, let A be fixed. In statistics, different representations of exponential 
families have been considered, each of which has its own benefits and highlights 
different aspects. We will review a number of these representations. The particular 
choice of parity functions will allow us to make a link to coding theory. 

As we have introduced exponential families, the key concept is the interaction 
space which is sometimes also called tangent space to the exponential family. 
This space completely characterizes the exponential family. However, there is a 
choice of the parameterization of this space, which has been made differently in 
different fields. Speaking in terms of linear algebra, one has to choose a generating 
system of a linear space. 

Let *B := {bk ■ k £ K} be any finite generating system of Z4. Each such choice 
gives a different parameterization of the exponential family and a different sufficient 



HIERARCHICAL MODELS, MARGINAL POLYTOPES, AND LINEAR CODES 



5 



statistics. The parameterization is identifiable if 05 is a basis. The exponential 
family is parameterized as 



where again Zq is the normalization and d — |03| equals the number of parameters. 
In statistical physics the exponent is commonly called the energy. 

To each choice of 03 there is a polytope constructed as follows. Consider the 
vectors 



Each such vector has as its components the evaluation of every element in 03 at x. 
The polytope is 



Since A contains all atoms, it can be seen that the polytope has \X\ vertices and 
the dimension equals the dimension of the exponential family. By applying some 
classical theorems from statistics, such as the existence and uniqueness of maximum 
likelihood estimates [Kul68 , Csi75 , it can be seen that the points of the polytope are 
in one to one correspondence with points in the closure of the exponential family. As 
we have introduced it here, it is clear that the different choices of 03 yield different 
representation of the same polytope in the sense that they are all afhnely equivalent. 
In particular, they have the same face lattice. 

The polytope Pqj encodes in its face lattice the combinatorial structure of the 
exponential family in the sense that a knowledge of the face lattice gives precise 
knowledge about the supports of elements in the closure of the exponential family. 
However, direct computation is infeasible for real world problems. 

In statistical physics, and also for various inference methods it is of interest to 
compute the free energy, given as the logarithm of the partition function. There, 
variational principles and the techniques of Legendre transform are applied. In this 
setting the points in the polytope are then the so called dual parameters. See for 
instance [WJ03] . 

We will review a number of choices for 03: 
Statistical Physics - Potentials. In statistical physics one considers so called potentials 
|Win03|[Geo88] . A potential is a collection of functions Ua,A C N, where Ua S 
and C/g = 0, such that the energy can be written as a linear combination hereof. 
Typically one has a distinguished state o called the vacuum. A potential is called 
normalized if Ua{x) — as soon Xi — oi for some i G A. Given a strictly positive 
distribution, a corresponding normalized potential exists and is unique. In our 
binary setting, choosing (0,0, ... ,0) as the vacuum state, the normalized potential 
is given by the functions Ua = ca YiieA x ii wnere C A G K- 

One has 03 = {IlieA x i ■ ^ e A), and a basis of the interaction space is given by 
03 together with the constant function x i— ► 1. Expanding a function H S K in terms 
of this basis was called the x-expansion in the works of Caianicllo [Cai86 Cai75 . 

In the case of pair interactions where the hypergraph is given by A2,n, the 
polytope P<b coincides with the so called correlation polytope [DL97 . Extending 
the terminology to an arbitrary hypergraph A, we call P<b the moment polytope, as 
each point in it is the vector of moments of some distribution. 




b K (x) = (b k (x)) k £K 



P<s '■= conv {bx(x) : x € X} . 



6 



THOMAS KAHLE, WALTER WENZEL, AND NIHAT AY 



Marginals. One representation of an exponential family is given via the linear map 
that computes the marginals. Denote A m the set of inclusion maximal sets in A. 
Consider the linear map 

tta : R x — » R Xa 
AeA™ 
(u A ) AeA - 

That, for a given vector u computes the set of its maximal marginals defined as 

ua(xa) ■■= ^2 u ( y ^- 

y-X A (y)=XA 

When represented as a matrix with respect to the canonical basis, ir A has rows 
indexed by pairs (^4, y A ) of a set A £ A m and a configuration y A € X A . The columns 
are indexed by configurations x £ X. Each component then contains the value of 
the indicator l{x A =y A }- 

Denote the x-th column of this matrix as tt x then, the exponential family is 
parameterized as 

p(x)^Z e 1 exp((e,n x )), 9eR d . 

In terms of these vectors, the polytope is commonly called the marginal polytope. 
It is represented as 0/1 polytope embedded in a high dimensional space. 
An orthogonal basis of characters. In the binary case X = {0,l} [Af] , a natural 
basis for M. x is given by the characters of X. Here, we assume pointwise addition 
modulo 2 as the group operation. For every subset A 6 A define the function 
e A : X ^{-1,1} by 

e A (x) := (-1)^*) 

where E(A,x) := \{i S A : Xi = 1}|. It can be seen that, if A is a hypergraph, 
{e A : A S ^4} together with the constant function e% : x i— > 1 is an orthogonal basis 
of the interaction space X A . This approach was followed in KA06 . Various people, 
starting with Caianiello |Cai75] have called this the ry-expansion. Note that if one 
considers random variables taking values in {±1} this basis equals the monomial 
basis {IlieA 2 -* ' ^ — W]} considered above. 

A basis of parity functions. Finally, we will introduce yet another basis of X A which 
is derived from the basis of characters. To each ^ A C [N], we define a vector in 



P.V 



(1) Sa{x) 



if |supp(a;) n A\ is odd 
otherwise. 



The following proposition is easily checked: 

Proposition 6. Let 1 : X — ► K be the constant function x i— > 1. The set 
{f A : A € A} U {1} is a basis ofl A . 

One crucial point about choosing this representation is that it gives, if the constant 
function is omitted, a full dimensional 0/1 polytope, the vertices of which form an 



additive group and thereby a linear code (see Proposition 10 1. For all other choices 



of 25 discussed in this section the image of bx is not a subgroup of {0, l} d or the 
multiplicative group {±1} . 
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While in the construction of a hierarchical model we assumed a hypergraph, the 
following polytope is an interesting object of study also in the general case of a 
pre- hypergraph: 

Definition 7. Let A be a pre- hypergraph. We define 

J~A '■= conv {f^(x) : x € X} . 

If A is a hypergraph, then this is affinely equivalent to the marginal polytope 
of the corresponding exponential family. In the case of the hypergraphs Ak,N we 
write J~k,N '■= 3~A k n- The rest of the paper is devoted to the study of this class of 
polytopes. 

Remark (CUT-Polytopes). There is a well known [DL97 affine equivalence between 
CUT polytopes of graphs [ZicOO and binary marginal polytopes: 

Namely, to each graph G we can associate the hypergraph Aa — V(G) U E(G). 
This is distinct from what was called a graphical model above, as not the cliques 
are considered. Some authors refer to the corresponding statistical model as a graph 
model. From G construct the coned graph G with an additional vertex: 

V(G) :=U(G)U{*}, 

and edges 

E(G) := E{G) U {(u, *):»€ V(G)} . 
Then, denoting the CUT polytope of G as CUT(G) one has 

T Ag = CUT{G). 

Using the representation in terms of the vectors /_4 G (x), x € X, the proof of this 
equivalence becomes a simple renaming of coordinates. 

Remark (Covariance Mapping). As remarked above, in the representation with 
monomials x * one finds the correlation polytope COR(N) as a special case. 

From the last remark it follows that the CUT-polytope of the complete graph 
K N+ i is equal to N . There exists an affine equivalence between COR(N) 
and CUT(Kn + i) called the covariance mapping [DL97 . It can be seen that this 
mapping generalizes to a mapping between binary marginal polytopes and the 
corresponding moment polytopes. It therefore might be suitable to consider the 
parity representations Ta of binary marginal polytopes for a generalization of 
CUT-polytopes to arbitrary (pre)-hypergraphs. 

3.1. Computations and elementary properties. Using the geometry software 
polymake GJQO], one can compute linear descriptions of polytopes. As an example, 
we give here the F- Vectors of Fk,N for the cases N = 3, 4. For N = 5, the F- Vector is 
too complicated to be computed by the brute force approach of polymake. However, 
waiting sufficiently long, one can get the 6800 facet defining inequalities of ^3,5 and 
the 3835488 facets of T±&. 

Example 8. In Tables [T] and [2] we give the F- Vectors of Tk,N for N = 3,4, 
computed using polymake. The rows label the dimension of the faces, the columns 
the value of k. The reader might wonder about the fact that the face lattices of 
Tk,N are up to a certain dimension isomorphic the face lattice of the simplex. This 
property, commonly called ncighborlincss, follows from a general result in [Kah08j. 
The last row refers to whether the polytope is simple or not. 
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Table 1. Face structure of Tkfi 
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Table 2. Face structure of JF fe 4 



In the following, we will list elementary properties of !Fk,N that follow easily from 
the definition. 

(i) Fi,n is the TV-cube. 

(ii) Fn,n is the (2^ — l)-dimcnsional simplex. 

(iii) every Tk,N has dimension d = |-4fc,jv|. 

(iv) every Tk,N has 2^ vertices. 

(v) (0, . . . , 0) is a vertex. 

(vi) every Fk,N is a projection of the (2 N — 1) -dimensional simplex Tn,n along 
coordinate axes. 

(vii) For every Tk.N, there is a projection along coordinate axes that projects it to 
the iV-cube F\ t N- 
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Remark. In [HS02 it was remarked that Tn-i,n has exactly 4 W_1 facets. The 
extreme points of these facets are also known. A set y C X defines a face if and 
only if it contains neither U := {x S X : f\M]{x) — 1} nor its complement. Note 
that the set U and its complement are exactly the set of configurations with a fixed 
parity. As the vertices of Tn-i,n have only one affine dependency, it is not difficult 
to prove this fact using the Gale transform. By the above Tn—x,n is combinatorially 
isomorphic to the so called cyclic polytope |Zie94| . 

In the following, we develop the connection to coding theory. 

4. A Link to Coding Theory 

We briefly recall the definition of a linear code. For a detailed introduction into 
coding theory see for instance |van99j . Consider the finite field F 2 = ({0, 1} ,©,©) 
with addition and multiplication mod 2. In coding theory, one studies particularly 
vector spaces over this field. 

Definition 9. A binary [n,k]-linear code is a linear subspace L of F 2 such that 
dimL = k. A generator matrix G for L is a k by n matrix which has as its rows a 
basis of L. Given L one can find an equivalenirl code such that it has a generator 
matrix in standard form, i.e. G = {E^,H), where is the k by k identity matrix. 

The following proposition states that the vertices of T A form a linear code for 
any pre-hypergraph A. A special case of this connection has been mentioned in 
Example 2 in |WJ03j . 

Proposition 10. Let {0, 1}"^ be considered as a vector space over the finite field F 2 . 
Then the image of X under f A is a linear subspace. If we also consider X — as 
a vector space over F 2 , f A is an injective homomorphism between vector spaces. Its 
image forms an [\A\ , N]-linear code. A generator matrix in standard form has as 
its rows the vectors f A (ei) for i = 1,.. . , N, where ei is the i-th unit vector in F^. 

Proof. Since scalar multiplication is trivial, we only need to show 

(2) f A (x ®y) = f A (x) © f A (y) for x,y <E X. 

Let A € A, it suffices to show the identity for f A . To do so, introduce 



M 


= {ie A 


( Xi = 1 A Vi = 0) V {xi = 0Aj/i = l)}, 


M x 


= {ieA 


Xi = 1}, 


My 


= {ie A 


W = l}. 



Then f A (x © y) = \M\, f A (x) = \M X \, and f A (y) = \M y \. We find that M is the 
symmetric difference of M x and M y : 

M = M x A M y . 

Since \M X A M y \ = \M X \ + \M y \ — 2 \M X n M y \, we have that in F 2 

\M\ = \M X \ © \My\ , 

and therefore Q holds. We now show that f A is injective. To see this, assume 
that f A (x) = f A {y). Since A contains all atoms {i} C [N], we get for every i £ [Nj: 



Two codes are called equivalent if one can be transformed into the other by applying a 
permutation on the positions in the codewords, and for each position a permutation of the symbols. 
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f{i}{x) = f{i}(y). This implies X{ — yi and, hence, x = y. Since X considered as an 
F2 vector space has dimension N, also /a(X) has dimension N and therefore forms 
an [A, 7V]-linear code. □ 

Remark. To write down the generator matrix, one has to impose a numbering on 
the elements in A. If the numbering is in such a way that Aj = {/} for 1 < i < N, 
then the generator matrix is in standard form. 

Remark. An important property of a linear code is its distance, which is defined 
as the minimal Hamming distance between different elements of the code. For the 
hierarchical model of the hypergraph Ak,N, the distance of the code is given by 




Proof. Let d(x 7 y) denote the hamming distance of x, y e X. If al(x,y) = 1, then 
d(.fA k ( x ), iUfc (y)) equals the number of subsets of [N] which contain a given element 
and have cardinality at most fc. □ 

In the following, we will elaborate on the opposite direction. Let 2 N > s > N. 
Assume we are given an [s, N] linear code. Without loss of generality, we assume 
that it has a generator matrix in standard form. We will construct a pre-hypergraph 
A from the columns of the generator matrix. Since A is a set, while the columns are 
a list, repetitions of columns will be lost. If one considers only non-repetitive codes, 
then our construction is injective, and the codewords are given by the vertices of 
Ta- 

Let En <E ¥^ xN denote the identity matrix in dimension N. Assume the 
generator matrix G = (En, H) e F^ X;S has no 2 identical columns. (This implies 
s < 2 N .) Denote by {e^ : i = 1, . . . , N} the canonical basis of F^. Using the columns 
of H, we define sets 

Aj ■= {i S [N] : Hij = 1} , j = l,...,s-N 

and then, 

Note that the elements of A are numbered in a natural way such that we can use A 
as an index set for the columns of G = (Gi,A)i=i,...,N, a={i}....,{n},a 1 ,....a s - n - 

To see that {/^(e^) : i = 1, . . . , N} is the set of rows of the generator matrix, we 
evaluate 

fA(ei) = 5{ ieA } = G i: A 

which holds by definition of the Aj. 

Summarizing, every binary linear code (in standard form) corresponds to a 
pre-hypergraph. However, two codes that differ only in repetitions of columns in 
the generator matrix will be mapped to the same pre-hypergraph. Then, if it is a 
hypergraph, the linear code is the linear code of a hierarchical model. 



5. Classification 



As we have seen, the polytopes Ta are full dimensional polytopes such that 
the vertices form a linear code. In this last section, we classify all polytopes with 
this property. Then we investigate which of them can be realized as polytopes of 
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hierarchical models. For a convex polytope P, let V(P) denote the vertex set of P. 
For n € N, put 

C n := [0,1]", 
W n :={Q,l} n = V(C n ), 

Hence (W n ,ffi) is an Abelian group that is canonically isomorphic to (Fj,®). We 
consider W n as a subset of W 1 and write "©" whenever we mean addition modulo 
2, while "+" means ordinary addition in R n . 

In the following, we develop an algorithm that determines - by induction for every 
n € N - all polytopes P C R n with V(P) C W n satisfying the following conditions: 

(I) (V(P),®) is a subgroup of (W„,®). 
(II) P has dimension 77. 

Note that the number of vertices of such a polytope is a power of two. Of course, 
the full n-dimensional cube P — C n satisfies ^ and ([Tl]) . To start the induction, we 
remark that there are no further such polytopes in the cases n = 1 and n = 2. For 
7i = 3, the 3-dimensional regular simplex S with 

V(S) -{(0,0,0), (1,1,0), (1,0,1), (0,1,1)} 

satisfies ^ and pi} , too. 

More generally, by |Wen06[ Theorem 2.2], we have the following 

Proposition 11. For n > 3, the following statements are equivalent: 

(1) (W n >©) contains some subgroup U such that conv([/) is a regular simplex 
of dimension n. 

(2) 77 + 1 is some power of 2. 

In the case n = 3, the full 3-cube as well as the regular simplex mentioned above 
are the only polytopes satisfying conditions |l]) and (JTTj) . Note that also 

{(0,0,0), (1,0,0), (0,1,1), (1,1,1)} 

determines a subgroup of (W3,©); however, the convex closure has dimension 2. 
For fixed 77 > 2, define the bijections tt : R" x {0} -> R n and tti : R" x {1} -> R™ 

by 

7r (^i, . . ■ , x WJ 0) := (xi, . . . , x n ), 
wi(xi, . . . ,x n , 1) := (sci, ■ • . ,x n ). 



For 1 < i < 77 put 

fl" i :={(a;i > ... > x w )GM B :a: i = 0} J 
H[ := ^(xi,...,x n ) eR n :^ = i|. 

Moreover, let z := (|, . . . , 5) denote the center of the 77-cube C n . 

To determine recursively all 0/1-polytopes P C R" that fulfill (|l| and |n]), we 
prove first the following 

Proposition 12. Suppose that 77 > 2 anrf that P C R™ is a 0/1-polytope satisfying 
0) and ^IssTime i/iai (U,©) is a subgroup of(V,®) := (F(P),©) mf/i |V : J7| = 
2. T/ien f/ie following statements are equivalent: 
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(i) The polytope Q C R"+ , given by 
(3) V(Q)=^ 1 (U)U^ 1 (V\U), 

has dimension n + 1 . 
fw,) There does not exist some index i with I < i < n such that U C Hi. In other 

words, none of the affine hyperplanes H[ separates conv(/7) and conv(y \ U). 
(Hi) One has z G conv([/) n conv(y \ U). 
(iv) One has conv(Z7) n conv(V \ J7) ^ 0. 

Proo/. © -f @: 

Suppose that U C Hi holds for some i with 1 < i < n. Put 

H := {(xi, . . .,x n ,x n+1 ) G K™ +1 : cc„+i = Xi) . 

Since P = conv(V) has dimension n and since \V : U\ — 2, we must have Xi = 1 
whenever (cci, . . . , a; n ) G V \ P. This means that V(*3) - and hence also Q - is 
contained in the n-dimensional hyperplane H, in contradiction to 
© -»• ©: 

For 1 < i < n, let «i : FJ, 1 — > F2 denote the linear map given by ai(:ci . . . , x„) := a;,-. 
By assumption, a^j/ is surjective for 1 < z < n. Hence we have 

\{u G U : oti(u) = 0}| = \{u G U : a^u) = 1}| for 1 < i < n. 

This means that 

z = TFT\ ^ U 6 conv ^)' 

where the sum is taken in R™. 

Now fix Vi G V \ U. Since V \ U — {«i (Bu : u £ U}, we get also 

\{veV\U : cti(v) =0}\ = \{v £V\U : a t {v) = 1}| for 1 < i < n, 

and hence 

1 x 

\V\U~\ 

1 X 1 vGV\U 



z ~ TT7VT7T 2_/ w e conv (^ \ U). 



(iii 


I - ( 


(iv 


I -> ( 



Consider the projection 7r : ]R" +1 — > M™ given by 

7t(Xx 5 ■ • ■ 3 ^n; *^n+l) = (*^l?---)*^n)- 

Suppose that the assertion is wrong; hence Q is contained in some n-dimensional 
-homogeneous- hyperplane G C ]R ra+1 . Since 

n(V(Q)) = UO(V\U) = V, 

the polytope Q has the same dimension as P = conv(V), that is n. Thus, the 
restriction -k\q is a linear isomorphism from G onto R™, and there exists some 
M-linear map a : R n — > G satisfying 

(a o 7t)(k;) = w for all w G G. 

By definition of this means: 

a(U)=*o 1 (U), 

a{V\U) = ^ 1 {V\U). 
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Hence, a(conv(U)) = conv(a(£/)) and a(conv(V\ U)) = couv(a(V\ U)) are linearly 
separated by the affine hyperplane 



K:= 

By (pv| this is impossible. 



, x n , x n +i ) G 



pn+l 



X n +1 



□ 



Example 13. We investigate the statement of Proposition [12] for polytopes cor- 
responding to a pre-hypergraph A. We start by considering the matrix which has 
as its rows the vectors Ja(x) 7 where A = 2^ \ {0}. The rows are labeled by the 
binary strings of length N, that is by X, while the columns are indexed by the 
non-empty subsets of [N]. Therefore the rows of this matrix are the coordinates of 
the vertices of the simplex !Fn,n'- 



X 


{1} 




{N} 


{1,2} 




{1,2,3} 




[N] 


(00... 0) 























(00... 1) 







1 


/{1,2}0*0 




/{l,2,3}(z) 




f\N]{x) 




















(11. ..1) 


1 




1 


/{i,2}(a0 




/{1,2,3}0) 




f[N](x) 



We note the following facts: 

• The columns of this matrix are exactly the 2 N — 1 non-zero binary strings 
of length N. 

• There are 2^ — 1 subgroups U of index 2 of the TV-cube, which correspond 
to the columns of the matrix. To define them let a column A be fixed, then 
put U := {x 6 X : /a(#) = 0}. The maps /a : X — ► {0, 1} are exactly the 
2^ — 1 surjective homomorphisms having the nontrivial subgroups as their 
kernels. 

• The vertices of every polytope Ta are given by deleting columns from this 
matrix that correspond to sets not in A. 

• In particular, by restriction to the first N columns, we get the vertices of 
the TV-cube T\.n- 

Now, assume that P is the JV-cube. We choose a column of the matrix, corresponding 
to a subgroup of index 2. There are two possibilities. If we choose a column 
corresponding to an atom, then (ii) is wrong, the dimension does not grow when 
adding this column to the coordinates (as we have doubled a coordinate). If, on the 
other hand, we choose a column corresponding to a set A with cardinality two or 
more, then we are in the situation of Proposition 12 since (ii) holds. The lift ^ 
will be full dimensional, and its vertices are given by the submatrix with columns 
{1} , • • • , {N} , A. Continuing from here, choosing another subgroup, the dimension 
will grow if and only if it does not correspond to one of the sets {1} , . . . , {N} , A. 
Iteratively, the choices narrow down and, finally, when all columns have been chosen, 
the polytope Q is a simplex. 

We will now formalize this procedure. For a fixed polytope P as in Proposition 
03 put 

Ui := V(P) HHi for 1 < i < n. 
Clearly, conditions (fil) and |ll| imply that \V(P) : £/,-| = 2 holds whenever 1 < i < n. 
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Based on the equivalence of (JTJ) and ([h]) in Proposition 12 , we are now able to 
prove that the following algorithm yields recursively all 0/1-polytopes satisfying 
and 

Algorithm 14. 

Initialization for n = 1 ; 

• Vi :={[0,1]}- 

Step n — ► n + 1: Based on ^„ construct a new set tyn+i consisting of all polytopes 
Q such that there exists P £ *p„ with 

• Q = P x [0, 1] or 

• Q C E" + 1 witfj 

(4) V(Q) = n- 1 (U)Un- 1 (V(P)\U), 

where U runs through all subgroups of (V(P),(B) with \ V(P) : U\ = 2 and 
U ^ Ui for 1 < i < n. 

Remark. Note that in the case Q — P x [0, 1], the number of vertices is doubled, 
while in the other cases the number of vertices of Q equals the number of vertices of 
P. Furthermore, it is interesting to see that the two possible operations commute 
in the following sense. Starting from some cube W n , lifting it to W n +i and then 
choosing a subgroup U to apply the lift Q gives the same polytope as choosing 
the subgroup tt(U) from W n and then taking the prism over the lifted polytope, 
where tt : K™ +1 — » M. n is the canonical projection. Therefore, all polytopes that are 
constructed by the algorithm can be thought of as lifted cubes W n . 

The classification will be complete with: 

Theorem 15. For all n £ N, the set *|3 n in Algorithm \14\ consists of all n- 
dimensional 0/1 polytopes that satisfy conditions and p7}). 

Proof. First we show that all polytopes Q £ ^5 n +i satisfy conditions ^ and (JTTJ) , 
with n replaced by n + 1. This is clear in the case of the prism Q — P x [0, 1]. 

If Q satisfies Q, then clearly (V(Q),(B) is a subgroup of (W n+ i,©), because U 
is a subgroup of (V(P), ®) with |T^(-P) : U\ =2. Moreover, (JTTJ) — » |i} in Proposition 
[l2] implies that Q has dimension n + 1, because U ^ Ui for 1 < i < n. Hence, Q 
satisfies conditions |l| and (JTTJ) . 

Vice versa, assume that Q C R™ +1 fulfills |l| and |n]). Consider again the 
projection tt : M. n+1 — * K n onto the first n coordinates, and put P := n(Q). Since 
Q has dimension n + 1, P has dimension n. If tt\v(q) 1S not injective, then Q is 
the prism P x [0,1], because (V(Q),(B) is a subgroup of (W n +i,©). If 7i"rv(Q) is 
injective, put 

[/ := {(x u ...,x n ) £ V(P)\(xx,...,x n ,Q) £ Q}. 

Then (U,®) is a subgroup of (V(P),®) with |V(P) : U\ = 2, because Q has dimen- 
sion n + 1. Moreover, equation Q holds for U as just defined. Finally, Proposition 
12 (jTJ) >■ ([ii]) shows that U ^ Ui for 1 < i < n. Hence, our algorithm includes the 



determination of Q. □ 

As a first application of Theorem [15] we can count the number of n-dimensional 
polytopes that satisfy conditions ^ and (JTTJ) . Let c„ :— |^ n |. For 1 < k < n, let 
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c n (k) denote the number of all 0/1 polytopes PCM" with |V(P)| = 2 k that satisfy 
^ and Then one has obviously 



(5) 



fc=i 



We have c n (k) = for 2 k < n, because a polytope with at most n vertices cannot 
have dimension n. Furthermore, we have clearly c n (n) = 1 for all n € N. 

As mentioned already in Example 13 a 0/1-polytope that satisfies (|l]), (EL and 
| V(-P) | = 2 k has among its vertices exactly 2 k — 1 subgroups of index 2. Hence by 
ignoring the groups t/, = V(P) D Hi for 1 < i < n, we get 

Corollary. For k < n < 2 k one has 

c n+1 {k) = c n (k - 1) + c n (k)(2 k -n-l). 
The first few values are given in the Table [3] It is easy to compute this number 



n \ k 


1 


2 


3 


4 


5 


6 


7 


8 




1 


1 
















l 


2 





1 














l 


3 





1 


1 












2 


4 








5 


1 










6 


5 








15 


16 


1 








32 


6 








30 


175 


42 


1 






248 


7 








30 


1605 


1225 


99 


1 




2960 


8 











12870 


31005 


6769 


219 


1 


50864 



Table 3. The number of n-dimcnsional 0/1 polytopes with 2 k 
vertices that form a group. 



also for larger values of n. For instance 

c 28 = 718897730072178204358180468879825453986397667929112558174208 

c 100 w 2.77 • 10 644 

Finally, using the Corollary we can show that, among the full dimensional 0/1- 
polytopes with 2 k vertices the convex hulls of linear codes are exceptional. For 
1 < k < n, let d n (k) denote the number of all 0/1 polytopes with 2 k vertices satisfying 
only condition (pj). Hence, the number d n of all 0/1 polytopes of dimension n trivially 
satisfies 

n 

(6) d n >^d n {k). 

k=l 

Moreover, we get 
Proposition 16. (i) For 4 < n < 2 k < 2 n , one has 

(7) d n (k) > 2 k (2 n - 2 k )c n (k) > n2 n - 1 c n (k). 
(ii) We have 

lim ^ = 0. 
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Proof. (i) Suppose that U is a proper subgroup of (W n , ©) with dim(conv(£7)) = n 
and \U\ = 2 k . 

If U' is another subgroup of (W n ,®) with \U'\ — \U\, then we have 

\unu'\ < 2 fe ~ 1 < 2 71 - 1 



and, hence, 



\U\U'\ >2 k ~ 1 > - >2. 



(8) 



(9) 



1: 



This means 

|E/\ E/'| > 3. 

There are 2 fe (2 n -2 fe ) subsets V of W„ with \V\ = 2 k and |V \ U\ = \U \ V\ 
namely, these are all sets of the form 

V=(U\ {u Q }) U {v q } with u Q €U,v € W n \ U. 

For V as in ([£]), we get dim(conv(y)) = n, because otherwise, U \ {u } would 
be contained in a -unique- hyperplane H with vo £ H, a contradiction to 
Vq £ U. Together with we obtain the first inequality in The second 
one is trivial in view of Or < 2 n ~ 1 . 
(ii) By (j5j, (J6j , and Q we get for n > 4: 

ZZL < 2^ 



i 



I'n-l 



i'ts-I 



<2(^c n (fc)) PTd„(£0 
\fe=l > 
2 2-n 



\fe=i 
tn-l\-l 



< 2(n2 n - 1 )- 1 = 
This proves the second statement. 



□ 



As a concluding remark, we study the question of constructing a statistical model 
from a given polytope. Assume P C R™ satisfies ([I]) and ([n]), when does it come 
from a hierarchical model? To begin with, observe that the number m of vertices 
of P is a power of 2, since it must divide the number of vertices of the cube W n . 
We write m — 2 N . 



By Theorem 
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we know that P can be constructed using the 
algorithm. It is constructed from the iV-cube by applying several steps of the second 
type in Algorithm |14| Therefore, every such polytope corresponds to a subset of 
columns {1} , . . . , {N} , A/v+i, . . . ,A S in the matrix of coordinates, or equivalently 
to a pre-hypergraph. However, this pre-hypergraph is not unique. If we are given 
only a list of vertices, then there are several possibilities to choose a generator matrix 
in standard form. As an example consider the polytope ^2,3 : 



X 


{1} 


{2} 


{3} 


{1,2} 


{1,3} 


{2,3} 


(0,0,0) 




















(0,0,1) 








1 





1 


1 


(0,1,0) 





1 





1 





1 


(0,1,1) 





1 


1 


1 


1 





(1,0,0) 


1 








1 


1 





(1,0,1) 


1 





1 


1 





1 


(1,1,0) 


1 


1 








1 


1 


(1,1,1) 


1 


1 


1 
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In fact every 3 by 3 submatrix which is, after permuting rows and columns, the iden- 
tity matrix gives a generator matrix in standard form. Obviously, there are several 
such choices. Here, for instance we can choose the canonical basis corresponding to 
the atoms: 



X 


{1} 


{2} 


{3} 


{1,2} 


{1,3} 


{2,3} 


(1,0,0) 


1 








1 


1 





(0,1,0) 





1 





1 





1 


(0,0,1) 








1 





1 


1 



On the other hand, we can also reorder the columns and choose 



X 


{3} 


{12} 


{13} 


{1} 


{2} 


{2,3} 


(1,1.1) 


1 








1 


1 





(0,1,0) 





1 








1 


1 


(1,1,0) 








1 


1 


1 


1 



When the generator matrix is chosen, one can apply the method given in Section 
[4] to construct a pre- hyper graph. For our first generator matrix we get back the 
hypergraph we started with, for the second choice we read of the pre-hypergraph 

A' := {{1'} , {2'} , {3'} , {1', 3'} , {1', 2', 3'} , {2', 3'}} , 

where we have introduced new units {l',2',3'} corresponding to the first three 
columns. This shows the ambiguity due to the choice of a particular generator 
matrix if only the code is given. 
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